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FOREWORD 


Military  decisions  hinge  on  an  ability  to  forecast  the  developing 
situation.  A  sufficient  understanding  of  the  energy's  response  behavior 
sets  the  stage  for  causing  the  desired  response.  Clearly,  prediction 
and  control  are  essential  aspects  of  the  decision  process.  And  yet, 
classical  techniques  treat  only  a  limited  portion  of  such  real  world 
situations.  Least  mean  squared  error  reduction  is  rarely  the  appropriate 
criterion.  The  "plant”  or  transducer  is  neither  linear  nor  passive. 

The  usual  approach  is  to  extend  the  classical  techniques  through 
quasi-1 inear  approximations  and  higher  ordered  differential  functions. 

But,  such  progress  leads  to  ever  more  complex  formulations  that  are 
increasingly  unsuitable  to  primitive  computation,  and  furthermore 
requires  an  in-depth  understanding  of  the  physical  process  which  is 
not  always  available. 

What  is  needed  is  an  alternate  formulation...  a  new  beginning 
based  on  a  different  view  of  the  logical  process  required  for  prediction, 
control,  and  other  aspects  of  intelligence.  Evolutionary  programming 
constitutes  such  an  approach...  and  simple  high  speed  parallel  processing 
will  be  notably  suitable  for  such  fast  time  simulation  of  the  evolutionary 
process.  The  task  at  hand  is  to  devise  such  programming  in  support  of 
military  prediction  control  and  decision  processes  while,  at  the  same 
time,  to  extend  our  intellectual  capability. 


INTRODUCTION 


The  first  Quarterly  Progress  Report  described  experiments  in  the 
prediction  of  diverse  time  series  against  various  payoff  functions  through 
the  use  of  Evolutionary  Programming.  It  is  now  worthwhile  to  resolve  some 
uncertainty  concerning  the  structure  of  this  program  and  to  dimensional ize 
the  use  of  this  predictive  capability.  Given  an  arbitrary  prediction 
problem,  how  can  one  determine  the  manner  in  which  it  should  be  treated 
through  Evolutionary  Programming? 

Some  additional  experiments  were  performed  to  clarify  uncertainties 
and  to  prepare  for  conducting  a  significant  number  of  larger  scale 
experiments  to  be  performed  on  the  NASA  Ames  Cray  computer. 

Various  authors  have  suggested  the  use  of  crossover  as  a  mechanism 
for  improving  simulated  evolution.  Some  experiments  were  performed  to 
explore  the  worth  of  this  concept.  Another  series  of  experiments  were 
conducted  to  demonstrate  Evolutionary  Programming  within  the  context  of 
identifying  an  arbitrary  unknown  transducer.  Some  difficult  combinatorial 
problems  such  as  the  classic  Traveling  Salesman  Problem  can  be  addressed 
through  the  evolution  of  less  complex  logics  (for  example,  single  state 
machines).  A  demonstration  in  this  regard  is  contained  in  Appendix. 


DISCUSSION 


Additional  Experiments  on  Prediction 
Increased  Machine  Size  and  the  Saving  of  Equal-Worth  Offspring 

In  view  of  the  previous  experiments,  it  remains  unclear  as  to 
whether  or  not  it  is  worthwhile  to  save  offspring  that  are  of  the  same 
worth  as  their  parent.  It  is  also  of  interest  to  enquire  as  to  the  worth 
of  allowing  the  evolution  of  larger  finite  state  machines.  The  previously 
used  evolutionary  program  was  altered  to  save  equally  worthwhile  offspring 
and  to  permit  finite  state  machines  of  up  to  fifty  states,  this  in  the 
hope  that  an  increase  in  size  will  improve  the  predictive  capability.  In 
addition,  offspring  of  equal  worth  to  the  parent  were  also  saved. 

The  first  experiment  required  prediction  of  the  same  binary  cyclic 
environment  used  in  the  original  experiments  (101110011101).  As  expected, 
in  the  first  experiment,  a  perfect  predictor  was  found,  but  this  logic 
was  not  discovered  any  faster.  Figure  1  shows  the  predictive  fit,  while 
Figure  2  indicates  the  cumulative  percent  correct  prediction.  Note  that 
perfect  predictions  were  always  made  after  the  170th  prediction.  Figure  3 
indicates  that  adding  a  state  is  a  beneficial  form  of  mutation.  The  rate 
of  increase  in  the  size  of  the  machines  follows  the  probability  of  adding 
a  state.  Note  that  as  the  evolutionary  process  proceeds,  the  percentage 
of  such  beneficial  mutations  (as  compared  with  all  possible  mutations) 
gets  smaller,  and  therefore  the  evolutionary  process  "slows  down."  This 
suggests  that  after  a  given  complexity  is  reached,  it  might  be  appropriate 
to  include  a  penalty  for  that  complexity  or  an  increase  in  the  probability 
of  deleting  a  state. 

The  Prediction  of  Noisy  Environments 

Ten  experiments  were  conducted  to  examine  the  impact  of  noise  on  the 
predictive  capability  of  the  evolutionary  process.  The  same  binary  cyclic 
pnvirnnmpnt  was  eorruDted  bv  havina  each  svmbol  chanae.  this  with  a 
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probability  of  one-sixth.  Figure  4  shows  the  best  of  these  experiments 
with  the  predictive  fit  score  being  very  close  to  the  expected  value 
for  a  machine  that  perfectly  fits  the  underlying  cycle.  Note  that  the 
excursion  of  the  trace  above  the  dashed  line  indicates  that  the  evolving 
finite  state  machines  were  fitting  the  experienced  noise.  As  shown  in 
Figure  5,  the  evolution  achieves  about  65  percent  correct  predictions. 

The  size  of  the  machines  grew  rapidly  to  an  upper  limit  to  twenty-five 
states  because  of  the  noise,  reference  Figure  6. 

Figure  7  shows  the  worst  of  the  ten  experiments.  Here  the  predictive 
fit  score  was  about  ten  percent  below  what  could  be  expected.  After  196 
predictions,  there  was  no  evidence  that  the  evolutionary  process  could 
predict  better  than  fifty  percent,  see  Figure  8.  Figure  9  again  reveals 
a  rapid  increase  in  the  size  of  the  evolving  finite  state  machines. 

Figures  10  and  11  indicate  the  mean  and  two  sigma  limits  for  the  ten 
experiments.  The  mean  shows  a  steady  but  slow  increase  (probably  to  83  1/3 
percent,  the  highest  possible  percent  correct  for  this  noisy  environment). 
The  two  sigma  "confidence  limits"  converge  as  expected.  The  mean  worth 
(average  predictive  fit)  was  very  close  to  the  maximum  expected  value 
and  had  relatively  narrow  confidence  limits,  see  Figures  12  and  13.  Note 
that  if  the  environment  is  noisy  and  an  all-or-none  payoff  function  is 
imposed,  the  evolutionary  process  constructs  machines  that  fit  both  signal 
and  noise  with  equal  weight.  This  is  particularly  apparent  if  the 
environment  is  binary. 

A  brief  experiment  required  the  prediction  of  a  noisy  four-symbol 
environment.  Here  the  noise  was  quasi-Gaussian. . .  a  67  percent  chance 
of  altering  each  symbol  ±  1  and  a  33  percent  chance  of  altering  each 
symbol  ±  2.  See  Figure  14.  A  linear  payoff  function  was  used  to  provide 
some  "incentive"  for  predicting  close  to  the  correct  true  symbol.  This 
encourages  discovering  the  signal  as  opposed  to  the  noise.  Figures  15 
and  16  indicate  the  predictive  fit  and  prediction  capability  generated 
through  use  of  this  less  severe  payoff  function.  Since  the  four-symbol 
environment  was  known  to  be  simple  (a  two-state  machine  would  perfectly 
predict  the  cycle),  a  0.01  penalty  per  state  was  imposed. 
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Nine  experiments  were  performed  on  the  above-referenced  binary  cyclic 
environment  wherein  offspring  equal  in  predictive  fit  to  the  parent  were 
saved,  and  a  0.01  penalty  per  state  was  imposed.  In  one  experiment,  an 
eight-state  perfect  predictor  machine  was  found  on  the  63  0  prediction, 
see  Figure  17.  12,603  offspring  were  evaluated  in  the  196  predictions. 

Figure  18  shows  that  the  machines  greatly  increase  in  complexity  as 
required  to  "solve  this  problem."  Figures  19  and  20  show  that  the  mean  and 
two  sigma  limits  were  slightly  better  than  those  previously  obtained, 
reference  the  first  Quarterly  Progress  Report. 

Predicting  the  Fibonacci  Series,  Modulo-10 

Having  determined  that  saving  offspring  of  equal  worth  to  the  parent 
can  be  of  benefit,  an  experiment  was  performed  to  reveal  the  predictive 
capability  of  the  evolutionary  program  with  respect  to  an  environment 
generated  by  the  Fibonacci  Series,  Modulo-10.  Figures  21  and  22  indicate 
significant  improvement  over  the  previously  reported  results.  The 
evolutionary  process  was  now  able  to  "learn"  more  of  the  60-symbol  long 
cycle.  After  3,020  predictions,  a  42  percent  cumulative  predictive  score 
was  achieved.  In  principle,  this  environment  would  eventually  be  perfectly 
predicted  by  a  ten-state  machine. 


Seven  experiments  were  performed  on  the  typical  "I.Q.  test" 
environment  101001000...  again  saving  offspring  of  equal  worth;  all  other 
conditions  being  the  same  as  those  indicated  in  the  first  Quarterly  Progress 
Report.  Figures  23  through  29  show  these  experiments  in  terms  of  percent 
correct  and  demonstrate  the  adaptation  with  respect  to  an  ever-changing 
environment.  In  the  first  experiment,  the  "discovery"  that  it  is  better 
to  predict  the  zeros  was  not  yet  demonstrated  because  it  had  successfully 
predicted  the  early  one's.  The  second  experiment  shows  a  distinct 
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improvement  after  220  predictions.  Clearly,  the  evolving  finite  state 
machines  now  primarily  predict  the  zeros.  Figures  30  through  36  show 
these  seven  experiments  in  terms  of  the  size  of  the  machines.  In  every 
case,  there  was  a  rapid  build-up  to  the  imposed  limit.  Conceivably,  a 
higher  limit  would  have  allowed  for  a  greater  predictive  fit  score; 
however,  in  the  limit,  the  results  would  be  the  same.  As  the  machines 
increase  in  size,  each  state  would  correspond  to  a  given  symbol  in  the 
environment.  Eventually,  there  would  be  a  steady  state  that  predicts 
only  zeros.  Figures  37  and  38  show  the  mean  and  two  sigma  limits  for 
these  seven  experiments. 

Altered  Mutation  Variation 

Ten  experiments  were  performed  to  examine  the  worth  of  altering  the 
probabilities  of  mutation  in  order  to  improve  the  effectiveness  of  the 
evolutionary  process.  Here  the  same  binary  cyclic  environment  was  used 
with  a  thirty  percent  chance  of  adding  a  state  and  ten  percent  chance  of 
deleting  a  state.  Figures  39  through  44  indicate  the  cumulative  percent 
correct.  In  these  ten  experiments,  the  evolutionary  process  generated 
five  perfect  predictors.  As  stated  in  the  previous  Progress  Report,  no 
perfect  predictors  were  found  in  thirty  such  experiments.  Figure  45 
indicates  the  manner  in  which  the  states  bulid  up  in  these  experiments. 
Figures  46  and  47  indicate  the  mean  cumulative  percent  correct  scores  and 
two  sigma  limits,  respectively.  The  average  number  of  evaluated  offspring 
was  3,238.8  with  only  a  slight  variance. 


Predicting  Cycles  within 


A  "double  obverse"  environment  was  constructed  consisting  of  ten 
repetitions  of  the  above  binary  cycle  followed  by  ten  repetitions 
of  223445,  this  being  followed  by  another  ten  cycles  of  the  original 
binary  sequence.  Offspring  of  equal  worth  were  saved.  Five  experiments 
were  conducted  demonstrating  early  learning.  The  prediction  score  then 
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reflected  difficulty  in  learning  the  second  cyclic  component  of  the 
environment,  see  Figures  48  through  52.  Saving  equally  worthwhile  off¬ 
spring  leads  to  a  rapid  increase  in  machine  complexity,  see  Figures  53 
through  57.  It  is  believed  that  the  added  complexity  that  results  from 
saving  offspring  of  equal  worth  acts  as  a  buffer  to  the  coding  that 
predicts  the  original  environment.  As  expected,  therefore,  when  the 
original  environment  was  re-introduced,  the  evolutionary  process  quickly 
redemonstrated  its  previous  ability.  Again,  note  that  an  increase  in  size 
can  markedly  slow  down  the  evolutionary  process  when  it  encounteres  a  new 
environment.  Also  note  that  the  initial  learning  of  the  first  sequence 
was  slower  than  previously  demonstrated.  This  was  due  to  the  fact  that  the 
evolutionary  process  "believed"  it  was  dealing  with  a  six-symbol  envir¬ 
onment  instead  of  merely  a  two-symbol  environment  (the  initial  machine 
was  experssed  in  a  six-symbol  language). 

Similar  experiments  were  also  conducted  wherein  the  cycles  were  all 
within  the  same  four-symbol  language.  Here  the  sequence  0132  was  repeated 
fifteen  times  followed  by  331022  repeated  fifteen  times  with  a  return  to 
the  original  sequence  repeated  fifteen  times.  In  the  first  experiment, 
only  offspring  that  were  superior  to  their  parents  in  predictive  fit  were 
saved.  Here  the  evolutionary  process  discovered  a  perfect  predictor 
of  the  first  cyclic  sequence  at  the  20th  prediction,  see  Figure  58.  348 

offspring  were  evolved  to  yield  a  four-state  machine  shown  in  Figure  59. 
Note  that  only  a  single  state  machine  is  necessary  to  predict  this  first 
sequence.  After  the  environment  changed  at  the  51st  prediction,  the 
evolutionary  process  showed  a  sharp  drop  in  success  but  regained  to  a 
prediction  capability  of  five  correct  out  of  the  six  symbols.  At  the  141st 
prediction,  the  original  environment  returned,  and  a  perfect  predictor 
was  still  evident.  200  predictions  evaluated  6,585  offspring.  Throughout 
the  experiment,  the  machines  remained  of  small  size  (three  to  four  states) 
because  here  only  offspring  superior  to  the  parent  were  saved. 

In  the  second  experiment,  the  offspring  were  saved  if  they  scored 
equal  to  or  better  than  the  parent.  Here  again,  the  evolution  indicated 
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superior  learning  ability  perfectly  predicting  the  cyclic  environment  on 
only  the  eighth  prediction  (having  seen  only  four  cycles)  see  Figure  60. 
However,  the  evolutionary  process  had  great  difficulty  in  learning  the 
second  environment  (due  to  the  added  complexity  gained  while  learning 
the  first  environment).  When  the  initial  environment  returned,  a  perfect 
predictor  was  still  in  evidence.  The  size  of  the  finite  state  machines 
quickly  grew  to  the  upper  limit  of  twenty-five  states.  This  experiment 
tends  to  confirm  the  notion  that  large  machines  slow  down  the  evolutionary 
process.  It  is  useful  to  compare  the  above  results  to  the  classical 
zeroth  order  prediction  of  the  same  environment,  reference  Figure  61.  Note 
its  extremely  poor  performance. 

Crossover  Experiments 

J.H.  Holland,  in  his  book.  Adaptation  in  Natural  and  Artificial 
Systems,  1975,  proposed  the  use  of  genetic. operators.  He  believed  that 
simulated  evolution  could  be  improved  by  drawing  an  anology  to  sexual 
reproduction.  Specifically,  he  suggested  that  two  machines  be  "crossed- 
over"  through  a  substitution  of  states.  A  large  number  of  experiments  have 
been  reported  by  Holland  and  others  on  the  use  of  this  technique... 
without  their  being  any  reference  to  finite  state  machines  as  being  the 
embodiment  of  the  evolving  behavior. 

Several  experiments  were  therefore  conducted  to  investigate  the 
worth  of  this  proposition.  The  first  fifteen  experiments  involved  the 
same  binary  cyclic  environment  and  used  a  two-state  crossover,  that  is, 
two  arbitrary  states  of  the  best  machine  was  substituted  for  two  states 
in  another  machine  of  similar  size.  The  probability  of  this  crossover 
was  chosen  to  be  fifty  percent  with  the  remaining  probabilities  uniformly 
distributed  over  the  five  modes  of  mutation.  Figures  62  through  67 

r(j 

show  that  the  best  experiment  discovered  a  perfect  predictor  at  the  63 
prediction.  2,976  offspring  were  evaluted,  and  the  size  of  the  evolving 
machines  rose  steadily.  The  worst  experiment  did  not  discover  a  perfect 
predictor  in  196  predictions.  3,018  offspring  were  evaluated.  The 
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increase  in  machine  size  was  similar  to  that  previously  demonstrated. 

While  the  mean  worth  (predictive  fit)  of  the  process  consistently  increased 
(Figures  68  and  69)  the  mean  percent  correct  of  future  predictions  was 
almost  identical  to  that  without  crossover;  see  Figures  70  and  71.  On 
the  average,  3,001.2  offspring  were  evaluated  in  196  predictions. 

Crossover  was  also  examined  within  the  context  of  the  I.Q.  test 
environment.  Twenty  experiments  were  conducted,  each  performing  85 
predictions  with  an  all-or-none  payoff  function.  Figures  72  and  73  indiate 
the  mean  cumulative  percent  correct  and  the  two  sigma  limits.  After  the  40th 
prediction,  the  process  properly  predicts  all  the  zeros,  although  with  a 
significant  variance.  However,  Figures  74  and  75  show  that  the  predictive 
fit  variation  around  the  mean  is  very  narrow.  All  machines  tend  to  fit 
the  previous  history  in  the  same  manner.  There  was  little  difference 
in  the  results  with  and  without  crossover. 

The  environment  was  then  predicted  using  the  asymmetric  payoff  function. 
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Figures  76  through  79  show  the  result  of  eighteen  experiments.  The  mean 
worth  of  the  abilities  to  fit  previous  history  was  slightly  worse  than  that 
without  any  crossover.  The  poorest  results  asymptotically  fell  toward 
the  expected  value  of  the  environment.  The  experiment  with  the  best 
results  showed  some  ability  to  predict  the  early  one's. 
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As  pointed  out  by  Dr.  Wirt  Atmar,  Holland's  crossover  technique 
does  not  resemble  the  manner  in  which  recombination  occurs  in  nature. 

Sexuality  in  natural  organisms  results  in  the  recombination  of  alleles. 

Alleles  are  defined  as  one  of  two  or  more  alternative  genes  that  control 
the  same  characteristic  and  occupy  the  same  place  on  similar  chromosomes. 

For  Holland's  technique  to  be  effective,  it  must  be  assumed  that  a  finite 
state  machine  is  analogous  to  a  chromosome  and  that  each  state  output 
and  state  transition  is  analogous  to  a  gene.  But,  this  is  not  correct. 

Genes  contain  the  instructions  for  generating  specific  physical  functions. 

They  are  much  like  subroutines.  Here  a  finite  state  machine  could  be 
thought  of  as  a  gene.  However,  the  structure  of  a  finite  state  machine 
could  never  be  thought  of  as  a  gene. 

As  pointed  out  in  Elements  of  Biology  by  Charles  K.  Levy,  Addison 
Wesley  Publisher,  Reading,  Massachusetts,  1982,  changes  by  crossover  are 
not  true  mutations  "since  neither  the  amount  nor  the  function  of  genetic 
material  is  altered."  Under  Holland's  crossover,  while  the  amount  of  coding 
is  not  altered,  the  function  of  the  coding  is  greatly  altered.  The  result 
of  this  is  a  near-random  search  throughout  the  entire  space  of  possible 
solutions.  As  the  size  of  the  machines  grow  larger,  the  number  of 
states  used  in  corssover  increases.  As  shown  in  Figures  80  through  85, 
when  predicting  the  cyclic  environment  012345676532210,  without  loss  of 
generality,  the  evolutionary  processes  which  do  inot  use  crossover  perform 
significantly  better  than  those  using  two-state  and  ten-state  crossover. 
Holland's  technique  effectively  destroys  the  link  between  parent  and 
offspring  that  is  necessary  for  an  evolving  process  to  succeed.  In  the 
words  of  Dr.  Atmar,  "While  his  original  thought  was  in  the  right  direction, 
the  technique  he  promotes  is  at  too  severe  a  scale.  True  sexuality  does 
not  destroy  (nor  create)  information;  it  only  shuffles  contending  subroutines, 
eventually  trying  almost  every  probable  combination  of  those  in  existence, 
retaining  only  those  combinations  found  to  be  most  beneficial. 

Even  at  this  far  milder  level  of  informational  reorganization,  evaluating 
the  "costs"  of  sexual  recombination  in  producing  inappropriately  behaving 
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organisms  has  been  a  topic  of  substantial  debate  which  has  continued  on 
in  biology  for  some  time  now.  The  debate  is  currently  being  resolved 
in  the  following  manner:  the  costs  of  sexual  recombination  are  no  longer 
considered  severe  because  (1)  the  contending  alleles  (subroutines)  have 
generally  been  found  to  be  quite  similar,  and  (?)  there  are  not  nearly 
as  manny  alleles  contending  for  each  site  as  was  originally  thought  or 
predicted. 

Sexuality  is  not  a  mutational  phenomenon.  It  is,  rather,  a  method 
of  rapidly  sorting  through  the  effects  of  mutated  subroutines.  The 
advantages  of  sexuality  in  an  evolutionary  optimization  program  are  not 
to  be  underestimated.  However,  a  very  specific  structure  must  be  in 
place  before  these  advantages  are  obtainable." 


Identification  Experiments 

As  previously  indicated,  prediction  is  key  to  the  process  of 
identification.  Suppose  an  unknown  entity  is  responding  to  known  stimuli 
in  a  clearly  observable  manner.  The  task  is  to  develop  an  explicit  repre¬ 
sentation  for  that  transduction,  this  on  the  basis  of  the  stimulus/response 
experienced  to  date.  The  predictor  observes  the  sequence  of  stimulus/ 
response  pairs,  the  most  recent  stimulus,  then  predicts  the  next  response., 
this  process  being  repeated  as  each  new  stimulus  is  observed.  The 
predictor  must  develop  a  most  suitable  logic  in  terms  of  the  specific 
payoff  function  and  span  of  prediction.  If  this  span  is  the  single  time 
unit,  the  discovered  predictive  regularity  should  correspond  with  the 
underlying  logic  of  the  transducer.  Evolutionary  programming  provides  a 
basis  for  such  prediction  and  identification. 

To  demonstrate  this  hypothesis,  a  series  of  pilot  experiments  were 
conducted.  Success  in  this  regard  might  justify  more  formidable  experi¬ 
ments  on  the  Cray  computer.  As  a  first  step,  the  transducer  or  plant  is 
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deterministic  and  fully  controllable,  that  is,  each  of  the  possible 
response  symbols  is  generated  by  each  of  the  states.  Plants  of  increasing 
complexity  in  terms  of  alphabet  size  and  number  of  states  were  used.  Once 
a  searching  prediction  score  has  been  obtained,  the  predictive  logic  is 
compared  to  the  actual  transduction  as  a  test  of  the  identification.  In 
these  experiments,  an  all-or-none  payoff  function  was  used  in  that  all 
symbols  are  equally  important,  and  no  credit  was  to  be  given  for  nearness. 

To  ensure  adequate  exploration,  random  input  sequences  were  used  to  drive 
the  plant. 

The  first  experiment  required  the  identification  of  a  two- symbol , 
two-state  finite  state  machine,  see  Figure  86.  The  evolutionary  process 
properly  identified  this  finite  state  machine  at  the  second  prediction, 
this  after  evaluating  only  thirty  offspring. 

The  next  experiment  required  identification  of  a  two- symbol ,  five-state 
machine,  see  Figure  87.  Here,  in  254  predictions,  the  evolutionary  process 
correctly  predicted  80  percent  of  the  responses  and  had  a  predictive  fit 
score  of  90  percent,  see  Figures  88  and  89.  In  all,  32,376  offspring  were 
evaluated.  Figure  90  indicates  the  final  evolved  machine  consisting  of 
six  states.  It  is  tempting  to  compare  this  machine  with  the  one  shown  in 
Figure  87,  but  this  is  dangerous.  The  addition  of  even  a  single  state 
changes  the  meaning  of  all  other  states.  There  can  be  no  simplistic  state- 
to-state  comparison.  Note  that  a  fully  controllable  finite  state  machine 
driven  by  random  input  in  a  binary  alphabet  has  a  fifty  percent  chance  of 
responding  with  either  of  the  alphabet  symbols.  The  predictive  score 
should  be  fifty  percent.  Clearly,  the  evolutionary  process  does 
significantly  better. 

In  the  next  experiment,  a  two-symbol,  ten-state  machine  was  the 
unknown  transducer,  see  Figure  91.  Here,  the  evolutionary  process  was  able 
to  achieve  a  high  predictive  fit  score,  see  Figure  92,  and  predicted 
extremely  well,  see  Figure  93.  The  final  machine  was  of  fifteen  states. 
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while  evaluating  8,040  offspring  in  the  290  predictions.  Note  that 
increased  complexity  of  behavior  may  not  be  simply  determined  by  the  number 
of  states.  Here  the  plant  has  a  relatively  simple  structure  which  was 
evidently  discovered  by  the  evolutionary  programming.  Presumably,  further 
evolution  would  finally  discover  the  computer  logic  of  this  machine. 

Next,  a  four-symbol  alphabet  was  used  driving  a  five-state  machine 
that  represented  the  unknown  transducer,  see  Figure  94.  Four  experiments 
were  conducted;  the  percentage  correct  prediction  ranges  from  36  to  50,  but 
in  every  case,  this  is  better  than  the  expected  25  percent  prediction  by 
zeroth  order  statistics.  Figures  95  to  101  show  the  predictive  fit  score 
and  percent  correct  for  each  of  these  experiments. 

An  eight  state  machine  was  then  used  as  the  plant,  see  Figure  102. 

Four  experiments  were  conducted.  Even  though  the  size  of  the  machine  had 
been  increased,  the  evolutionary  process  was  still  able  to  predict  30  to  40 
percent  correct,  see  Figures  103  to  106.  It  is  also  of  interest  to 
examine  the  predictive  fit  score  for  these  experiments,  see  Figures  107  to  110. 
The  60  to  70  percent  worth  indicates  that  if  the  evolutionary  process  had 
been  given  more  time  between  predictions,  superior  response  prediction 
could  h  ve  been  achieved.  In  each  of  these  experiments,  there  was  a  rapid  ' 
increase  in  the  size  of  the  machines,  see  Figures  111  to  114,  even  though 
equally  valued  offspring  were  not  being  saved.  An  average  of  11,422.5 
offspring  were  evaluated  in  making  the  290  predictions.  The  final  predic¬ 
tive  machine  of  the  first  experiment  consisted  of  ten  states,  see  Figure  115. 

Finally,  two  experiments  were  conducted  using  an  eight-symbol, 
five-state  machine  as  the  plant,  see  Figure  116.  Figures  117  and  118 
indicate  the  precent  correct  predictions  to  be  20  to  25  percent,  this  being 
twice  the  level  that  would  be  expected  on  the  basis  of  a  zeroth  order 
prediction  (12.5%).  Figures  119  and  120  indicate  the  predictive  fit  score 
for  these  two  experiments.  The  results  range  from  45  to  50  percent  worth. 
Evidently,  five  generations  per  prediction  was  insufficient.  Greater 
exploration  before  predictions  would  have  increased  the  predictive  fit 
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worth  and  the  percent  correct.  Once  again,  there  was  a  steady  increase  in 
the  size  of  the  evolving  machine,  see  Figures  121  and  122.  The  final 
machines  range  from  16  to  18  states. 

These  experiments  indicate  successful  identification  through 
evolutionary  programming.  The  logic  of  the  unknown  transducer  was 
represented  by  finite  state  machines  of  greater  size  than  required.  This 
is  especially  true  with  a  larger  alphabet.  However,  no  penalty  for 
complexity  was  invoked.  It  should  be  possible  to  evolve  machines  of 
similar  size  to  the  unknown  plant,  but  there  might  be  some  cost  for  this 
decreased  specificity. 

On  the  basis  of  these  results,  it  seems  worthwhile  to  explore  the 
identification  of  time  varying  and  noisy  plants  through  pilot  experiments, 
then  design  of  experiments  for  predictive  identification  using  the 
Cray  computer. 
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CONCLUSION 


Evolutionary  Programming  has  now  been  demonstrated  as  a  versatile 
means  for  the  prediction  of  nonstationary  time  series  and  the  identification 
of  an  unknown  plant.  It  is  fully  recognized  that  much  remains  to  be  done 
to  dimensional ize  the  program  as  a  function  of  the  presented  problem  in 
both  prediction  and  identification.  Arrangements  are  underway  to  use  the 
NASA  Ames  Cray  computer  for  such  work.  In  the  mean  time,  further  pilot 
experiments  are  planned  to  guide  these  larger-scale  demonstrations. 

It  is  now  considered  appropriate  to  explore  the  application  of 
prediction  and  identification  to  real  world  problems  in  a  preliminary 
manner.  Medical,  financial  and  other  data  sources  are  being  reviewed  in 
this  regard.  It  is  of  particular  interest  to  focus  attention  on  situa¬ 
tions  wherein  the  least  mean  squared  payoff  function  is  notably  inappropriati 
This  is  surely  the  case  for  epidemiologic  and  economic  time  series  where 
equally  correct  predictions  are  not  of  equal  worth,  and  the  cost  of  a 
"cry  wolf"  error  is  clearly  different  from  the  cost  of  a  missed  catastrophe. 

Plans  are  being  made  to  move  from  identification  to  control  through  a 
series  of  pilot  experiments.  Success  in  this  regard  might  well  influence 
the  design  of  weapon  systems. 
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ADDRESSING 

THE  TRAVELING  SALESMAN  PROBLEM 
THROUGH  ADAPTATION 

By  David  Fogel 


INTRODUCTION 

The  traveling  salesman  problem  has  received  a  great  deal  of  attention 
In  recent  years.  The  task  Is  to  arrange  a  tour  of  n  cities  such  that  each 
city  is  visited  only  once  and  the  length  of  the  tour  (or  some  other  cost 
function)  Is  minimized.  Here  Is  a  simple  yet  recalcitrant  combinatorial 
optimization  problem.  For  an  exact  solution  the  only  known  algorithms 
require  the  number  of  steps  to  grow  at  least  exponentially  with  the 
number  of  elements  In  the  problem.  Brute  force  finding  of  the  shortest 
path  by  which  a  traveling  salesman  can  complete  a  tour  of  n  cities 
requires  compiling  a  list  of  (n- 1)1/2  alternative  tours...a  number  that 
grows  faster  than  any  finite  power  of  n.  The  task  quickly  becomes 
unmanageable. 

BACK6RQUTO 

Two  recent  papers1  addressed  the  traveling  salesman  problem 

1-  "Proceedings  of  an  International  Conference  on  Genetic  Algorithms  and  Their 
Applications,"  John  J.Grefenstette,  Editor,  Carnegie- Mellon  University,  July  1985. 


through  use  of  the  genetic  algorithm  as  proposed  Py  J.H.  Holland  in  1975. 2 
This  algorithm  is  an  offshoot  of  the  evolutionary  programming  concept 
offered  by  L.J.  Fogel  In  1962, 3  then  demonstrated  in  his  doctoral 
dissertation4  and  described  In  the  book  Artificial  intelligence  through 
simulated  Evolution  5  Here,  intelligent  behavior  was  viewed  as  requiring 
prediction  of  an  environment  then  the  use  of  such  predictions  for  the  sake 
of  its  control  (at  least  to  the  extent  possible).  The  logical  process  of 
iterative  mutation  and  selection  of  behavior  is  simulated  in  fast  time  to 
eventually  evolve  a  logic  most  suitable  for  resolving  the  given  problem. 

The  behavior  of  each  artificial  organism  is  portrayed  by  a  finite  state 
machine...  a  mathematical  function  that  does  not  constrain  the  represented 
transduction.  It  need  not  be  linear,  passive,  or  without  hysteresis.  The 
original  ’machine*  (an  arbitrary  logic  or  a  "hint")  is  measured  in  its  ability 
to  predict  each  next  event  In  its  ’experience’  with  respect  to  whatever 
payoff  function  has  been  prescribed.  An  offspring  is  now  created  through 
random  mutation  of  this  ’parent’  machine.  It  is  scored  in  a  similar  manner 
to  the  parent  In  predictive  ability.  If  the  parent  is  better  than  its 
offspring,  the  parent  is  used  to  generate  another  offspring.  If,  however, 
the  offspring  is  better  than  the  parent,  the  offspring  becomes  the  new 
parent.  This  assures  non-regressive  evolution. 


2-  Adaptation  In  Natural  and  Artificial  Systems.  J.H.  Holland,  University  of  Michigan 
Press  .Ann  Ardor,  1975. 

3-  "Autonomous Automata* L.J. Fogel,  Industrial  Research , February  1962. 

4-  Tin  the  Organization  of  intellect,’  L.J.  Fogel,  Ph.P.  Dissertation,  U  C  LA.  1964. 

5-  Artificial  Intelligence  through  Simulated  Evolution,  L.J.  Fogel ,  AJ.  Owens,  M.J.  Walsh,  John 
Wiley&Sons,  New  York,  1966. 
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An  actual  prediction  is  made  when  the  predictive  fit  score  demon¬ 
strates  that  a  sufficient  level  of  credibility  has  been  achieved.  The 
surviving  machine  generates  a  prediction,  indicates  the  logic  of  this 
prediction  and  becomes  the  progenitor  for  the  next  sequence  of  progeny, 
this  in  preparation  for  the  next  prediction.  In  this  way,  randomness  is 
selectively  incorporated  into  the  surviving  logic.  The  sequence  of 
predictor  machines  constitutes  phyletlc  learning...  Inductive 
generalization...  generation  of  new  hypotheses  concerning  the  relevant 
regularities  found  within  the  experienced  environment,  this  in  the  light  of 
the  given  payoff  function.  Prediction  can  be  used  for  the  purpose  of 
identifying  an  unknown  transducer.  Feedback  of  the  evolved  model  then 
forms  a  basis  for  control. 

Rather  than  describe  each  organism  only  in  terms  of  its  behavior, 
Holland  chose  to  evolve  the  code  which  generates  such  organisms. 
According  to  D.H.  Ackley,6  Holland's  genetic  algorithms  search  a  parameter 
space  where  “any  point  in  the  parameter  space  can  be  represented  as  an  n 
bit  vector.  The  technique  manipulates  a  set  of  such  vectors  to  record 
information’  about  the  parameter  space.  "There  are  two  primary  operations 
applied  to  the  population  by  a  genetic  algorithm.  Reproduction  changes  the 
contents  of  the  population  by  adding  copies  of  genotypes  with 
above-average  figures  of  merit.  In  addition  to  this.Jt  is  necessary  to 
generate  new,  untested  genotypes  and  add  them  to  the  population,  else  the 
population  will  simply  converge  on  the  best  one  it  started  with.  Crossover 
is  the  primary  mean  of  generating  plausible  new  genotypes  for  addition  to 
the  population.’ 

6-  "A  Oonneclionisl  Algorithm  for  Genetic  Search 0.  Ackley,  in  Proceedings  of  on 
International  Conference  on  Genetic  Algorithms  and  Their  Applications  , 

J.J.  Grefenstette,  Editor ,  Cernegie-Mellon  University ,  July  1 985. 
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As  defined  by  Holland,  crossover  takes  two  structures, 

Al“a1  1al2-a1n  and  a2  *  a21a22*a2n* and  at  a  random  Point  between 
1  and  n,  exchanges  the  set  of  attributes  to  the  right  of  this  position 
yielding  offspring  of  the  form:  A*  ■  aj  | a 1 2- -a i xa2(x*  1 ) -a2n-  Ackley 

continues  to  state:  ‘This  'offspring'  is  added  to  the  population,  displacing 
some  other  genotype  according  to  various  criteria  where  it  has  the 
opportunity  to  flourish  or  perish  depending  on  its  fitness.’  “Mutation 
provides  a  chance  for  any  allele  to  be  changed  to  another  randomly  chosen 
value.’  ‘If  the  mutation  rate  is  too  low,  possibly  critical  alleles  missing 
from  the  Initial  population  will  have  only  a  small  change  of  getting...1nto 
the  population.  However,  If  the  probability  of  a  mutation  is  not  low  enough. 
Information...  will  be  steadily  lost  to  random  noise.’ 

Holland  likened  the  actual  code  being  mutated  to  that  of  the  genetic 
code  that  defines  a  given  organism.  While  Fogel,  et  al.  only  used  small 
degrees  of  ‘background’  mutation,  Holland  examined  the  operations  of  gene 
‘crossover’  and  ‘inversion’  among  other  actual  biologic  genetic 
recombinations.  Although  Holland's  work  went  largely  unnoticed  for  some 
time,  today  a  great  deal  of  attention  is  being  given  to  genetic  algorithms. 

At  the  above  referenced  conference  D.  Goldberg  and  R.  Lingle,  Jr7 
offered  several  observations: 

’1)  Simple  genetic  algorithms  work  well  in  problems  which  can  be 


7-  'Alleles,  Loci,  end  the  Traveling  Salesman  Problem,’  D.E.  Goldberg,  R.  lingle,  Jr.,  in 
Proceedings  of  an  international  Conference  on  Genetic  Algorithms  and  Their 
Applications  ,  J.  J  Grefenstette,  Editor ,  Car negie-Mellon  University,  July  1 985. 
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coded  so  the  underlying  building  blocks  (highly  fit,  short  defining 
length  schemata)  lead  to  Improved  performance. 

2)  There  are  problems  (more  properly  codings  for  problems)  that  are 
GA-hard  —  difficult  for  the  normal  reproduction  ♦  crossover  * 
mutation  processes  of  the  simple  genetic  algorithm. 

3)  Inversion  is  the  conventional  answer  when  genetic  algorithmists 
are  asked  how  they  intend  to  find  good  string  ordering,  but  inversion 
has  never  done  much  in  empirical  studies  to  date. 

4)  Despite  numerous  rumored  attempts,  the  traveling  salesman 
problem  has  not  succumbed  to  genetic  algorithm-like  solution." 

The  authors  then  suggested  a  new  type  of  crossover  operator, 
"partially-mapped  crossover  (PMX),"  that  would  lead  to  a  more  efficient 
solution  of  the  traveling  salesman  problem. 

Specifically,  consider  two  possible  codings  of  a  tour  of  eight  cities, 
Aj  and  A2,  a  return  to  the  initial  city  being  implicit: 

A,:  3  5  1  2  7  6  8  4 

A2:l  8  5  4  3  6  2  7 

PMX  would  proceed  as  follows:  Two  positions  are  determined  randomly 
along  the  Aj  coding.  The  actual  cities  located  between  these  positions 

along  A]  are  exchanged  with  the  cities  located  between  the  same 


positions  along  A2.  For  example,  if  the  positions  three  and  five  are  chosen, 
the  sub-coding  along  Aj  Is  1-2-7,  and  the  sub-coding  along  A2  Is  5-4-3. 
Each  of  these  cities  is  then  exchanged,  leading  to  the  new  tours,  A*j  and 

A*2: 

A*  j  :  7  1  543682 
A*2: 5  8  1  2  7  6  4  3. 

They  reported  two  experiments  on  ten  cities  where  the  PMX  operator 
enabled  the  search  to  efficiently  find  either  the  absolute  or  near  optimal 
solution.  Goldberg  and  Ungle  stated  that  this  operator  was  more  complex 
than  'simple  crossover.'  as  proposed  by  Holland.  As  will  be  shown,  in  fact. 
It  Is  not. 

In  another  paper  by  J.J.  Grefenstette,  R.  Gopal,  B.  Rosmalta  and 
D.  Van  Gucht8  Holland's  'simple  crossover’  was  utilized.  This  required  the 
formation  of  a  special  coding  structure.  Clearly,  using  this  operator  on 
two  valid  tours  could  result  In  an  "offspring*  that  was  not  a  valid  tour.  As 
A.K.  Dewdney9  related,  the  authors’  method  for  devising  the  appropriate 
coding  was  Ingenious. 

8-  ’Genetic  Algorithms  for  the  T raveling  Salesmen  Problem J.J.  Grefenstette,  R.  Gopal ,  B. 
Rosmalta,  D.  Van  Gucht,  In  Proceedings  of  an  International  Conference  on  Genetic 
Algorithms  and  Their  Applications  ,  J.J.  Greferrstette,  Editor  .Carnegie- Mellon 
University,  duly  1985. 

9-  'Computer  Recreations:  Exploring  the  field  of  genetic  algorithms  in  a  primordial  computer 
see  full  of  fllbs.’AK.  Dewdney,  Scientific  American ,  November  1985. 


"The  representation  for  a  five-city  tour  such  as  a,  c,  e,  d,  b  turns  out 
to  be  12321.  To  obtain  such  a  numerical  string  reference  Is  made  to 
some  standard  order  for  the  cities,  say,  a,  bt  c,  d,  e.  Given  a  tour  such 
as  at  c,  et  d,  t,  systematically  remove  cities  from  the  standard  list 
In  the  order  of  the  given  tour,  remove  a,  then  ct  e  and  so  on.  As  each 
city  is  removed  from  the  special  list,  note  its  position  just  before 
removal,  *is  first,  ris  second,  e Is  third,  d  is  second  and,  finally  t 
Is  first.  Hence  the  chromosome  12321  emerges.  Interestingly,  when 
two  such  chromosomes  are  crossed  over,  the  result  is  always  a  tour." 
Dewdney  continued  further  to  report  that  unfortunately  the  experiments 
with  this  representation  were  ‘not  very  encouraging.*  The  authors 
conducted  larger  experiments  than  those  of  Goldberg  and  LJngle,  including 
50,  100  and  200  cities.  In  the  three  reported  experiments,  after  a  large 
number  of  trials  (approximately  14000, 20000  and  25000,  respectively), 
the  best  tours  were  still  far  away  from  the  expected  optimal  solutions. 

At  this  point  it  Is  natural  to  ask  "why?".  After  all,  the  traveling 
salesman  problem  only  requires  discovery  of  a  logical  pattern.  This  seems 
completely  analogous  to  what  occurs  In  nature.  If  the  crossover  of  genes 
worfcs  In  natural  evolution,  why  shouldn't  it  work  here?  The  answer  is,  in 
fact,  as  noted  in  observation  *2  by  Goldberg  and  Lingle:  The  traveling 
salesman  problem  Is  GA-hard  —  difficult  for  the  normal  reproduction  ♦ 
crossover  *  mutation.  This  is  due  to  the  fact  that  Holland's  crossover 
operation  doss  not  mimic  the  biologic  crossover  of  genes. 

As  defined  by  C.K.  Levy, 1 0  crossover  is  the  phenomenon  where  'old 

1 0  -  Elements  of  Biola^/.  C.K.  Levy,  third  edition,  Addison- Wesley  Publishing  Company  Inc. , 
Reading  Massachusettes,  1962. 
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linkages  between  genes  on  homologous  chromosomes  are  broken,  and  new 
linkages  are  established.  Genes  that  reside  on  the  same  chromosome  and 
move  together  are  said  to  be  'linked*  A  linkage  group  is  any  group  of  genes 
physically  linked  on  one  chromosome."  Levy  goes  on  to  state:  "Changes  in 


Crossover  allows  for  different  combinations  of  alleles.  Alleles,  by 
definition,  control  the  same  characteristic  and  occupy  the  same  place  on 
similar  chromosomes.  Holland's  crossover  treats  the  entire  tour  as  a 
chromosome  and  each  city  in  a  tour  as  a  gene.  While  Holland  s  crossover 
does  not  change  the  amount  of  coding,  it  greatly  alters  the  function  of  the 
coding!  A  more  appropriate  biologic  interpretation  of  a  tour  would  be  that 
it  is  itself  a  gene.  Crossover  Inside  a  gene  Is  a  nonsequetor.  The  tour  is 
absolutely  not  analogous  to  a  chromosome  and  each  city  in  a  tour  Is  not 
analogous  to  a  gene.  These  relations  are  in  fact  anomolous. 

The  result  of  Holland's  crossover,  therefore,  is  a  near  random  search 
throughout  the  entire  space  of  possible  tours.  Perhaps  Oewdney  said  it 
best  when  he  stated  that  by  using  crossover  "there  is  so  much  juggling  of 
genes  and  cracking  of  chromosomes  that...(a  parent)...is  hard  put  to 
recognize  its  own  grandchildren"  This  is,  of  course,  the  very  essence  of 
the  difficulty!  Adaptive  plans  must  retain  already  made  advances  in  order 
to  ensure  that  an  optimal  solution  will  be  found.  As  the  number  of  cities 
grows  larger,  Holland's  crossover  effectively  destroys  the  link  between 
each  parent  and  Its  offspring.  The  results  can  even  be  worse  than  a 

•-  Underline  added  fay  this  author. 
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complete  enumeration  of  all  possible  tours  (reference  the  Addendum). 


AN  ALTERNATIVE  APPROACH 

As  an  alternative  solution,  consider  the  Adaptive  Algorithm,  so  named 
because  it  does  not  include  any  of  the  pseudo-genetic  operators  that 
Holland  has  suggested.  In  this  algorithm,  which  is  equivalent  to 
evolutionary  programming  restricted  to  single  state  machines,  only  slight 
mutations  are  made  to  an  existing  tour  by  removing  Just  sm.  city  from  a 
given  list  and  replacing  it  in  a  different  randomly  chosen  position.  This 
mutation  is  only  slightly  more  complicated  than  the  simplest  possible 
mutation...swapping  adjacent  cities.  It  is  clearly  less  complex  than  both 
the  PMX  operator  and  Holland  s  crossover.  Through  multiple  mutation,  this 
single  alteration  can  be  made  equivalent  to  either  of  these  crossover 
operators. 

According  to  Holland:  "If  successive  populations  are  produced  by 
mutation  alone  (without  (genetic)  reproduction),  the  result  is  a  random 
sequence  of  structures  drawn  from  (all  possible  structures)." * 1  This  is 
only  partly  correct.  The  Adaptive  Algorithm  does  result  in  a  random 
search,  but  only  in  a  portion  of  the  space  relatively  close  to  the  parent 
that  generates  the  offspring.  This  increases  the  effectiveness  of  the 
algorithm  dramatically  as  it  allows  for  the  retention  of  advances. 


1 1  -  Adaptation  in  Natural  and  Artificial  Systems.  J.H.  Holland,  University  of  Michigan  Press, 


Ann  Arbor,  1975. 


But  more  Is  still  required.  Not  only  must  advances  be  retained  but 
"dead-ends’  must  be  circumvented.  Because  there  Is  a  finite  number  of 
offspring  that  can  be  generated  through  mutation,  there  might  well  be 
stagnation  on  a  local  optimum.  To  prevent  this  It  Is  useful  to  randomly 
alter  the  adaptive  topography*  that  Is  being  searched.  This  can  be 
accomplished  In  various  ways.  One  of  these  Is  to  occasionally  allow  for 
the  survival  of  offspring  that  are  slightly  worse  than  their  parents.  In 
effect,  the  scoring  function  Is  made  "noisy."  What  results  Is  analogous  to 
the  searching  of  a  maze;  when  a  dead-end  Is  reached  some  backtracking  Is 
allowed  and  the  overall  search  Is  reinitiated. 

Unfortunately,  the  topography  Is  much  like  an  upside-down  bed  of 
nails,  with  some  nails  being  longer  (better)  than  others.  From  any  given 
nail,  It  Is  possible  to  travel  to  any  of  n(n-l)  other  nails  In  a  single 
mutation.  Thus,  unlike  a  maze,  when  the  evolving  phyletic  line  reaches  a 
non-optlmal  nail  from  which  no  single  mutation  results  In  a  better  tour.  It 
Is  Impossible  to  determine  the  "direction"  In  which  to  backtrack.  A  given 
nail  can  be  reached  In  a  multitude  of  ways;  therefore,  bactracking  Is 
allowed  In  any  direction.  To  further  aid  In  the  circumvention  of  stagnation, 
the  degree  of  "poorer  quality"  offsplng  that  are  occasionally  accepted 
should  gradually  be  reduced  as  the  adaptive  process  proceeds. 

EXPERiHEhflAL  f  1ND1N6S 

Experiments  were  performed  to  demonstrate  the  effectiveness  of  the 


*  -  "Adoptive  Topography'  refers  to  the  scoring  function  on  the  hyperspacc  of  possible 
"organisms." 
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Adaptive  Algorithm.  Initially,  128  Independent  trials  were  performed  on  a 
24  city  traveling  salesman  problem  where  the  cities  were  positioned  on 
the  periphery  of  a  rectangle.  Clearly,  the  minimum  length  tour  is  equal  to 
the  perimeter  of  the  rectangle,  here,  250.  Of  the  128  trials,  90.625%  found 
the  optimum  solution  in  an  average  of  5297.48  iterations,  see  Figure  I,  the 
maximum  number  of  iterations  being  arbitrarily  set  at  14,000.  Figure  2 
indicates  the  results  of  the  remaining  9.375%  of  the  trials.  Here,  at  least 
temporarily,  the  evolving  tours  were  trapped  on  a  local  optimum.  Despite 
the  seemingly  non-complex  arrangement  of  cities,  the  numerous  local 
optima  make  this  traveling  salesman  problem  difficult. 

The  cities  were  then  distributed  at  random.  First,  1 9  experiments 
were  conducted  requiring  a  tour  of  50  cities.  The  cities  were  redistributed 
for  each  experiment.  In  each,  no  optimum  tours  were  discovered  in  20,000 
iterations,  but  it  was  clear  that  the  evolutionary  process  was  "solving  the 
problem."  Figures  3  through  21  indicate  the  results  of  each  experiment 
Figure  22  Indicates  a  typical  example  of  the  evolutionary  process 
discovering  more  and  more  suitable  tours  as  offspring  are  evaluated.  Note 
that  "backtracking"  plays  an  integral  part  of  the  search. 

Experiments  were  then  performed  requiring  a  tour  of  100  cities  under 
similar  conditions.  Again,  while  none  of  the  eight  experiments  found  a 
perfect  tour,  the  evolutionary  process  performed  well.  Figures  23  through 
30  Indicate  the  results  of  the  eight  experiments  while  Figure  31  Indicates 
the  reduction  In  tour  length  as  offspring  are  evaluated. 
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Another  experiment  required  a  tour  of  90  cities.  Here,  ten  groups  of 
nine  cities  were  randomly  placed  on  the  coordinate  grid.  The  process  was 
allowed  to  evolve  32,000  offspring.  While  the  optimum  solution  remained 
undiscovered,  it  is  of  interest  to  note  that  the  problem  was  evidently 
addressed  at  two  distinct  levels.  The  evolutionary  process  initially  solved 
the  problem  at  a  gross  level,  discovering  the  minimum  tour  between  the 
groups  of  cities,  see  Figure  32.  Insufficient  time  was  allowed  to  sort  out 
the  problem  at  a  finer  level  of  detail. 

Finally,  an  extremely  large  traveling  salesman  problem  was  analyzed. 
Here,  256  cities  were  randomly  distributed.  The  previous  results  indicated 
that  the  Adaptive  Algorithm  would  not  discover  the  optimum  solution; 
however,  in  only  10,000  iterations  it  reduced  the  initial  tour  length  by 
roughly  50%.  Figure  33  indicates  the  surviving  tour  after  evaluating 
10,000  offspring  while  Figure  34  indicates  the  success  of  the  evolu¬ 
tionary  process  in  discovering  better  and  better  tours.  The  available 
computation  time  limited  the  analysis,  however  the  results  were  certainly 
encouraging. 

CONCLUSION 

Clearly,  the  Adaptive  Algorithm  is  an  effective  method  for  addressing 
the  traveling  salesman  problem.  Several  important  conclusions  can  be 
drawn  from  the  previous  experiments: 

•  "Sophisticated"  mutation  operations  are  not  only  unnecessary, 
but  are  detrimental  The  experiments  point  up  the  necessity  for 
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maintaining  a  substantial  link  between  parent  and  offspring.  The 
more  sophisticated  and  complex  mutation  operations  destroy  this 
link.  The  PMX  operation  may  perform  generally  superior  to 
Holland's  crossover  operation  because  it  tends  to  retain  more 
Information  during  each  generation. 

•  There  is  a  beneficial  effect  of  using  a  noisy  payoff  function. 
The  concept  of  a  noisy  payoff  function  is  similar  to  that  suggested 
by  S.  Kirkpatrick,  C.D.  Gelatt  and  M.P.  Vecchi,12  for  optimizing 
simulated  annealing,  but  it  is  not  necessary  to  resort  to  such 
specific  analogies.  In  a  constantly  changing  environment,  the 
rewards  and  penalties  for  different  behaviors  vary.  The  search  for 
better  and  better  solutions  is  never  ending.  Evolution  Is  a 
continuous  process  with  no  truly  optimum  solution. 

•  Further,  it  appears  unlikely  that  any  specific  noisy  payoff 
function  exists  that  will  allow  discovery  of  the  optimum  solution 
in  every  traveling  salesman  problem.  Each  such  problem  offers  a 
different  adaptive  topography;  therefore  the  appropriate 
distribution  and  amount  of  noise  cannot  be  determined  a  priori. 


Although  no  single  Adaptive  Algorithm  can  optimally  solve  every 
traveling  salesman  problem,  a  unique  Adaptive  Algorithm  can  be  developed 
to  address  each  traveling  salesman  problem  in  a  very  efficient  manner. 

12  -  ‘Optimization  by  Simulated  Annealing.'  S.  Kirkpatrick,  CD.  Gelatt  Jr.,  HP.  Vecchi,  Scitnc ». 
flay  1983. 
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Addendum 

A  completely  random  search  (with  replacement)  will  take  roughly 
twice  as  long  to  find  the  optimum  solution  as  an  enumeratlve  search 
(without  replacement).  To  show  this,  consider  the  following  two  theorems: 

Theorem  1:  If  there  are  B  possible  solutions  and  only  one 
optimum  solution,  the  expected  number  of  trials  that  must  be 
made  before  the  optimum  solution  Is  found,  using  an  enumeratlve 
search,  assuming  one  trial  Is  made  at  a  time.  Is  equal  to 
(B+D/2. 

Proof:  In  an  enumeratlve  search,  sampling  Is  made  without  replacement 
The  probability,  therefore,  of  discovering  the  optimum  solution  on  any 
given  trial  Is  equal  to  the  product  of  the  probabilities  of  not  discovering 
the  optimum  solution  on  any  prior  trial  multiplied  by  the  reclprlcal  of  the 
number  of  untried  solutions.  The  expected  number  of  trials  that  would  have 
to  be  examined  before  finding  the  optimum  solution  would  therefore  be: 

I  xf(x) «  WT1  ♦  ^KB-D/BKB-ir’  ♦  3T(B- 1  )/BH(B-2)/(B- 1  )KB-2r’ 

♦  -  ♦  (B- 1 )  l(B- 1  )/BH2/3H  1  /2]  ♦  B  [(B- 1  )/B)  [2/3]  [  1  /2l  1 
»  I  B'1  ♦  2  B'1  ♦  3  B'1  ♦  -  (B- 1 )  B"1  ♦  B  B_I 
«B_,  (I  ♦2*3*-  ♦  (B- 1 )  ♦  B) 

=  B-,[B(fl*  1  )/2] 

»(fl*1)/2.  Q.E.D. 
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Theorem  2:  If  there  are  B  possible  solutions  and  only  one 
optimum  solution,  the  expected  number  of  trials  that  must  be 
made  before  the  optimum  solution  Is  found,  in  a  completely 
random  search,  assuming  one  trial  Is  made  at  a  time.  Is  equal  to 


Proof:  in  a  completely  random  search,  sampling  Is  made  with 
replacement.  The  probability,  therefore,  of  discovering  the  optimum 
solution  on  any  given  trial  is  equal  to  the  product  of  the  probabilities  of 
not  discovering  the  optimum  solution  on  any  previous  trial  multiplied  by 
the  reciprical  of  the  total  number  of  possible  solutions.  The  expected 
number  of  trials  that  would  have  to  be  examined  before  finding  the  optimal 
solution  would  therefore  be: 

2  x  f(x) «  I  IT 1  ♦  2  KB- 1  )/fl}  fl"1  ♦  3  KB- 1  )/B?  B'1  ♦  • 

- fl'Wl  ♦  2K0-D/BJ ♦  3KB-D/B12 ♦  -> 

*  B‘tfl/(l-(B-1)/B))I2 

•  B.  Q.E.D. 
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